Subspace models for bottleneck features
نویسندگان
چکیده
The bottleneck (BN) feature, particularly based on deep structures, has gained significant success in automatic speech recognition (ASR). However, applying the BN feature to small/medium-scale tasks is nontrivial. An obvious reason is that the limited training data prevent from training a complicated deep network; another reason, which is more subtle, is that the BN feature tends to possess high inter-dimensional correlation, thus being inappropriate to be modeled by the conventional diagonal Gaussian mixture model (GMM). This difficulty can be mitigated by increasing the number of Gaussian components and/or employing full covariance matrices. These approaches, however, are not applicable for small/medium-scale tasks for which only a limited amount of training data is available. In this paper, we study the subspace Gaussian mixture model (SGMM) for BN features. The SGMM assumes full but shared covariance matrices, and hence can address the interdimensional correlation in a parsimonious way. This is particularly attractive for the BN feature, especially on small/mediumscale tasks, where the inter-dimensional correlation is high but the full covariance modeling is not affordable due to the limited training data. Our preliminary experiments on the Resource Management (RM) database demonstrate that the SGMM can deliver significant performance improvement for ASR systems based on BN features.
منابع مشابه
An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کاملSpeaker adaptation of convolutional neural network using speaker specific subspace vectors of SGMM
The recent success of convolutional neural network (CNN) in speech recognition is due to its ability to capture translational variance in spectral features while performing discrimination. The CNN architecture requires correlated features as input and thus fMLLR transform which is estimated in de-correlated feature space fails to give significant improvement. In this paper, we propose two metho...
متن کاملProbabilistic linear discriminant analysis with bottleneck features for speech recognition
We have recently proposed a new acoustic model based on probabilistic linear discriminant analysis (PLDA) which enjoys the flexibility of using higher dimensional acoustic features, and is more capable to capture the intra-frame feature correlations. In this paper, we investigate the use of bottleneck features obtained from a deep neural network (DNN) for the PLDA-based acoustic model. Experime...
متن کاملOn Bottleneck Product Rate Variation Problem with Batching
The product rate variation problem minimizes the variation in the rate at which different models of a common base product are produced on the assembly lines with the assumption of negligible switch-over cost and unit processing time for each copy of each model. The assumption of significant setup and arbitrary processing times forces the problem to be a two phase problem. The first phase determ...
متن کاملImproving for Drum_Buffer_Rope material flow management with attention to second bottlenecks and free goods in a job shop environment
Drum–Buffer–Rope is a theory of constraints production planning methodology that operates by developing a schedule for the system’s first bottleneck. The first bottleneck is the bottleneck with the highest utilization. In the theory of constraints, any job that is not processed at the first bottleneck is referred to as a free good. Free goods do not use capacity at the first bottleneck, so very...
متن کامل